forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pull] main from llvm:main #5546
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
See the attached test case.
PR #112540 as the reference.
…#124754) Enable device code ASan instrumentation for openmp offload applications using option '-fsanitize=address'.
…124247) The profiling of the LLVM Test-suite reveals that a significant portion, specifically 14,090 out of 139,323, loop nests were identified as non-viable candidates for transformation, leading to the transform exiting from isComputableLoopNest() without any action. More importantly, dependence information was computed for these loop nests before reaching the function isComputableLoopNest(), which does not require DI and relies solely on scalar evolution (SE). To enhance compile-time efficiency, this patch moves the call to isComputableLoopNest() earlier in the control-flow, thereby avoiding unnecessary dependence calculations. The impact of this change is evident on the compile-time-tracker, with the overall geometric mean improvement recorded at 0.11%, while the lencode benchmark gets a more substantial benefit of 0.44%. This improvement can be tracked in the isc-ln-exp-2 branch under my repo.
…turns (#125280) This patch does two things. 1. Previously, when checking driver arguments, we emitted an error for unsupported values of `-mbranch-protection` when using pauthtest ABI. The reason for that was ptrauth-returns being enabled as part of pauthtest. This patch changes the check against pauthtest to a check against ptrauth-returns. 2. Similarly, check against values of the following function attribute which are unsupported with ptrauth-returns: `__attribute__((target("branch-protection=XXX`. Note that existing `validateBranchProtection` function is used, and current behavior is to ignore the unsupported attribute value, so no error is emitted.
…fixes. Re-enables compact-unwind support in JITLink, which was reverted in b04847b due to buildbot failures. The underlying cause for the failures on the buildbots was the lack of compact-unwind registration support on older Darwin OSes. Since the CompactUnwindManager pass now removes eh-frames by default we were left with unwind-info that could not be registered. On x86-64, where eh-frame info is produced by default the solution is to fall back to using eh-frames. On arm64 we simply can't support exceptions on older OSes. This patch updates the EHFrameRegistrationPlugin to remove the compact-unwind section (__LD,__compact_unwind) when installed, forcing use of eh-frames when the EHFrameRegistrationPlugin is used. In LLJIT, the EHFrameRegistrationPlugin continues to be used for all non-Darwin platform, and will be added on Darwin platforms when the a CompactUnwindRegistrationPlugin instance can't be created (e.g. due to missing support for compact-unwind info registration). The lit.cfg.py script is updated to check whether the host OSes default unwind info supports JIT registration, allowing tests to be disabled for older Darwin OSes on arm64.
This library is provided by flang, not MLIR, so it should not be part of MLIR_LIBS. Fixes an issue introduced in #120966.
For convenience this patch drops nsw for `sub`. It also allows this fold with `ctlz_zero_undef`. Alive2: https://alive2.llvm.org/ce/z/VmvqSt
Changes: 1. Fix inconsistencies in register pressure set printing. "Max Pressure" printing is inconsistent with "Bottom Pressure" and "Top Pressure". For the former, register class begins on the same line vs newline for latter. Also for the former, the first register class is on the same line, but subsequent register classes are newline separated. That's removed so all are on the same line. Before: Max Pressure: FPR8=1 GPR32=14 Top Pressure: GPR32=2 Bottom Pressure: FPR8=7 GPR32=17 After: Max Pressure: FPR8=1 GPR32=14 Top Pressure: GPR32=2 Bottom Pressure: FPR8=7 GPR32=17 2. After scheduling an instruction, don't print pressure diff if there isn't one. Also s/UpdateRegP/UpdateRegPressure. E.g., Before: UpdateRegP: SU(3) %0:gpr64common = ADDXrr %58:gpr64common, gpr64 to UpdateRegP: SU(4) %393:gpr64sp = ADDXri %58:gpr64common, 390, 12 to GPR32 -1 After: UpdateRegPressure: SU(4) %393:gpr64sp = ADDXri %58:gpr64common, 12 to GPR32 -1 3. Don't print excess pressure sets if there are none.
Previously the grammar tokens SimpleValue2 through SimpleValue9 were unreferenced. This ties them together so that the grammar makes more sense.
This commit moves the rotate builtin to the CLC library. It also optimizes rotate(x, n) to generate the @llvm.fshl(x, x, n) intrinsic, for both scalar and vector types. The previous implementation was too cautious in its handling of the shift amount; the OpenCL rules state that the shift amount is always treated as an unsigned value modulo the bitwidth.
PR #124961 adds intrinsics for the tcgen05 alloc/dealloc PTX instructions. This patch adds NVVM Ops for the same. Tests are added to verify the lowering to the corresponding intrinsics in tcgen05-alloc.mlir file. PTX ISA link: https://docs.nvidia.com/cuda/parallel-thread-execution/#tcgen05-memory-alloc-manage-instructions Signed-off-by: Durgadoss R <[email protected]>
The Fortran libraries are not part of MLIR, so they should use target_link_libraries() rather than mlir_target_link_libraries(). This fixes an issue introduced in #120966.
These were referring to nonexistent grammar tokens instead of `Value`.
For consistency with input def handling.
Removed the TOSA quantization attribute used in various MLIR TOSA dialect operations in favour of using builtin attributes. Update any lit tests, conversions and transformations appropriately. Signed-off-by: Tai Ly <[email protected]> Co-authored-by: Tai Ly <[email protected]>
This PR moves maximum number of threads in a block and block in a grid to nvgpu dialect to avoid replicated code. The limits are defined here: https://docs.nvidia.com/cuda/cuda-c-programming-guide/#features-and-technical-specifications-technical-specifications-per-compute-capability
Make lifetime management more explicit. We're only using this for CXXPseudoDestructorExprs for now but we need this to handle std::construct_at/placement-new after destructor calls later anyway.
When building mlir with `-DMLIR_NVVM_EMBED_LIBDEVICE=ON`, there will be a warning ``` build/tools/mlir/lib/Target/LLVM/libdevice_embedded.c:1: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘143’ to ‘-113’ [-Woverflow] ``` which is followed by a large number of characters in stdout. Fix this to avoid stdout outputting a large number of characters (3e5).
The previous implementation had false positive/negative cases in the analysis of the loop carried dependency. A missed dependency case is caused by incorrect analysis of address increments. This is fixed by strict analysis of recursive definitions. See added test swp-carried-dep4.mir. Excessive dependency detection is fixed by improving the formula for determining the overlap of address ranges to be accessed. See added test swp-carried-dep5.mir.
If the input contains odd number of shuffled vectors, the 2 last shuffles are shuffled with the same first vector. Need to correctly process such situation: when the first vector is requested for the first time - extract it from the source vector, when it is requested the second time - reuse previous result. The second vector should be extracted in both cases. Fixes #125269 Reviewers: topperc, preames Reviewed By: preames Pull Request: #125693
There were a bunch of spots in ROCDL.td where we were defining our own llvmBuilder call which could have been generated using the default built-in one on LLVM_IntrOpBase. This commit cleans up such usages in the interests of potentinally enabling ROCDL import in the future and of making best practices more obvious. The one breaking change is renaming WaitcntOp to SWaitcntOp, which should have minimal impact.
…lled from `.preinit_array`. (#113427) (#116121) Signed-off-by: funsafe-ptr <[email protected]>
Fix for lit fail from #125737 Signed-off-by: Sarnie, Nick <[email protected]>
To establish a baseline for new tests mentioned in #117785, adding them here independently.
threadset clause is part of free-agent threads. But, free-agent threads also involves runtime implementation. So, added an separate entry for threadset clause and claimed it for clang. --------- Co-authored-by: Sunil Kuravinakop <[email protected]>
If BOLT fails to locate DWO CU when using split DWARF, this signifies an issue with the input (missing .dwo) rather than an internal assertion.
…-V (#125905) This is a follow-up to 5df62bd. That commit should not have needed to make the vector.insert and vector.extract conversions to SPIR-V directly handle the static poison index case, as there is a fold from those to ub.poison, and a conversion pattern from ub.poison to spirv.Undef, however: - The ub.poison fold result could not be materialized by the vector dialect (fixed as of d13940e). - The conversion pattern wasn't being populated in VectorToSPIRVPass, which is used by the tests. This commit changes this. - The ub.poison to spirv.Undef pattern rejected non-scalar types, which prevented its use for vector results. It is unclear why this restriction existed; a remark in D156163 said this was to avoid converting "user types", but it is not obvious why these shouldn't be permitted (the SPIR-V specification allows OpUndef for all types except OpTypeVoid). This commit removes this restriction. With these fixed, this commit removes the redundant static poison index handling, and updates the tests.
Summary: This probably wasn't the intended result, but the code here causes OpenMP to always link in `ockl.bc` which was intentionally not linked. This results in the OCKL definitions conflicting with the OpenMP ones and also prevents them from being optimized out (Might be fixed with newer ROCm that actually builds the visibility correctly). I'm pretty sure the only reason this didn't break the tests is because we're smart and pass `-nogpulib` there to keep the environment from being poisoned with stuff like this.
…125732) Reapplies #122471 This is based on #125699, only the latest commit is relevant. With changes in this PR and the parent one, the previously reported failures in the Fujitsu(*) test suite should hopefully be resolved (I verified all the 14 reported failures and they pass now). (*) https://linaro.atlassian.net/browse/LLVM-1521
Use CmpPredicate::getMatching in isImpliedCondBalancedTypes to pass samesign information to isImpliedViaOperations, and teach it to call CmpPredicate::getPreferredSignedPredicate, effectively making it optimize with samesign information.
…r uses (#124327) Remove the restriction that scheduling rematerialization candidates cannot have virtual reg uses. Currently, this only allows for virtual reg uses which are already live at the rematerialization point, so bring in allUsesAvailableAt to check for this condition. Because of this condition, the uses of the remats will already be live in to the region, so the remat won't increase live-in pressure. Add an expensive check to check this condition.
These aliases are never used, so we can ditch them.
…ing definition (#126061) Having them defined ouf-of-line results in a significant amount of boilerplate without improving readability, since they're just one or two lines long anyways. As a drive-by, add comments between the declarations to make them easier to distinguish.
…(y - z) + z even with Zicond. (#125772)" With the test changes. Original message: The Zicond version of this requires an li instruction and an additional register. Without Zicond we match this in a DAGCombine on RISCVISD::SELECT_CC. This PR has 2 commits. I'll pre-commit the test change if this looks good.
- Use `Emitter::OptClass` to invoke `InstrInfoEmitter::run` and eliminate the `EmitInstrInfo` function.
Since line zero means "no line information", when symbolizing a location (an address or an inline frame associated with the address) that has a line zero location, we shouldn't include other irrelevant data (like filename) in the result.
#125923) NaryReassociate would crash on expressions like the one in the added test that involved pointers where the size of the type was greater than the index width of the pointer, causing calls to SCEV's zext expression on types that didn't need to be zero-extended. This commit fixes the issue.
…25629) LLVM itself is generally moving away from using `undef` and towards using `poison`, to the point of having a lint that caches new uses of `undef` in tests. In order to not trip the lint on new patterns and to conform to the evolution of LLVM - Rename valious ::undef() methods on StructBuilder subclasses to ::poison() - Audit the uses of UndefOp in the MLIR libraries and replace almost all of them with PoisonOp The remaining uses of `undef` are initializing `uninitialized` memrefs, explicit conversions to undef from SPIR-V, and a few cases in AMDGPUToROCDL where usage like %v = insertelement <M x iN> undef, iN %v, i32 0 %arg = bitcast <M x iN> %v to i(M * N) is used to handle "i32" arguments that are are really packed vectors of smaller types that won't always be fully initialized.
…dSkipZero is not set (#126044) I ran into this while working on a different patch where I'm emitting a zero-valued DWARF enum field which shouldn't be skipped. This patch checks the (currently unused) `ShouldSkipZero` before deciding to skip printing this field. Based on git history this seems like an oversight from the initial refactor that introduced this. We have a similar check in `printInt`. Wasn't sure how to best test this, but tests in an upcoming patch rely on this functionality (see #126045). Currently the only place `ShouldSkipZero` is set to `false` is when emitting the `DW_LANG_` enum. But the language codes start at `0x1`. So it never exercised this codepath (and we should probably just make it not pass this parameter).
#126104) The fix requires more investigation, and it's a test issue so reverting the product changes should not be necessary. Signed-off-by: Sarnie, Nick <[email protected]>
…#122275) Goals: 1. To add syntax and semantic to 'batch_matmul' without changing any of the existing syntax expectations for current usage. batch_matmul is still just batch_matmul. 2. Move the definition of batch_matmul from linalg OpDsl to tablegen ODS infra. Scope of this patch: To expose broadcast and transpose semantics on the 'batch_matmul'. The broadcast and transpose semantic are as follows: By default, 'linalg.batch_matmul' behavior will remain as is. Broadcast and Transpose semantics can be applied by specifying the explicit attribute 'indexing_maps' as shown below. This is a list attribute, so the list must include all the maps if specified. Example Transpose: ``` linalg.batch_matmul indexing_maps = [ affine_map< (d0, d1, d2, d3) -> (d0, d3, d1)>, //transpose affine_map< (d0, d1, d2, d3) -> (d0, d3, d2)>, affine_map< (d0, d1, d2, d3) -> (d0, d1, d2)> ] ins (%arg0, %arg1: memref<2x5x3xf32>,memref<2x5x7xf32>) outs (%arg2: memref<2x3x7xf32>) ``` Example Broadcast: ``` linalg.batch_matmul indexing_maps = [ affine_map< (d0, d1, d2, d3) -> (d3)>, //broadcast affine_map< (d0, d1, d2, d3) -> (d0, d3, d2)>, affine_map< (d0, d1, d2, d3) -> (d0, d1, d2)> ] ins (%arg0, %arg1: memref<5xf32>,memref<2x5x7xf32>) outs (%arg2: memref<2x3x7xf32>) ``` Example Broadcast and transpose: ``` linalg.batch_matmul indexing_maps = [ affine_map< (d0, d1, d2, d3) -> (d1, d3)>, //broadcast affine_map< (d0, d1, d2, d3) -> (d0, d2, d3)>, //transpose affine_map< (d0, d1, d2, d3) -> (d0, d1, d2)> ] ins (%arg0, %arg1: memref<3x5xf32>, memref<2x7x5xf32>) outs (%arg2: memref<2x3x7xf32>) ``` RFCs and related PR: https://discourse.llvm.org/t/rfc-linalg-opdsl-constant-list-attribute-definition/80149 https://discourse.llvm.org/t/rfc-op-explosion-in-linalg/82863 https://discourse.llvm.org/t/rfc-mlir-linalg-operation-tree/83586 #115319
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.1)
Can you help keep this open source service alive? 💖 Please sponsor : )